Exploratory Data Analysis vs. Confirmatory Data Analysis

December 03, 2021

Exploratory Data Analysis vs. Confirmatory Data Analysis

Data analysis is an essential process in extracting meaningful insights from data. But when it comes to approaching data analysis, two terms often come up - exploratory data analysis (EDA) and confirmatory data analysis (CDA).

Both EDA and CDA techniques serve specific purposes and vary in terms of methodology and statistical rigor. In this blog post, we'll walk you through the difference between EDA and CDA and help you identify when to use each one.

Exploratory Data Analysis (EDA)

Exploratory data analysis (EDA) is used to discover patterns, relationships, or identifying trends in data. EDA is an introspective approach where the data analyzer creates plots, tables, and charts, which help them get a sense of the data or look for patterns to explore. EDA takes a loose interpretation approach, meaning that clean data or specific hypothesis testing techniques aren't required.

EDA's primary goal is to understand the underlying structure of data, drawing relationships between variables, identifying patterns and uncovering outliers, and detecting data anomalies. EDA tools commonly include histograms, scatterplots, and other graphs and tables.

When to use EDA

Exploratory Data Analysis is most useful when you want to get the “big picture” in the data without making assumptions about the data population. The EDA technique helps data analysts develop a set of assumptions, which they'll use to test statistical hypotheses using more formal methods later in the data analysis process. EDA is the primary step in most data scientist projects as it helps build familiarity with the datasets and possibly unearth hidden connections that may not otherwise have detected.

Confirmatory Data Analysis (CDA)

Confirmatory Data Analysis (CDA) is a structural approach to data analysis, involving a set of pre-existing hypotheses to be tested. CDA is designed to explicitly test hypotheses and draw conclusions from the data in a scientifically rigorous manner. Confirmatory data analysis leverages sampling techniques and data-driven model-building techniques to provide confidence intervals and statistical measures of significance.

CDA is a more formal and structured approach towards data analysis, and it uses established statistical techniques, such as hypothesis testing, ANOVA, regression analysis, and other statistical models.

When to use CDA

CDA is used where you have a specific question to answer, and you have a specific hypothesis or model in mind. In other words, when there is a hypothesis that you want to test or confirm, you may use CDA. Unlike EDA, CDA allows for more statistical precision as it involves hypothesis testing that requires a more organized insight to the data. It is, therefore, most useful for high stakes decision-making and in situations where you need a high degree of confidence in your results.

Conclusion

Both EDA and CDA have their importance, purpose, and methodology in data analytics. When you want to explore your data, identify patterns, tell stories, or discover trends, EDA is the most useful, and in hypothesis testing or confirming assumptions, CDA is the go-to. In the real data world, these two approaches are often interdependent, where EDA may lead to CDA or CDA may lead to further EDA.

Thank you so much for reading, let's explore and confirm data with confidence in the best way possible.

References

  1. Toward data science. https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15
  2. Analytic insight. https://analyticinsight.net/exploratory-data-analysis-practical-guide/
  3. Investopedia.https://www.investopedia.com/terms/c/confirmatory-data-analysis.asp

© 2023 Flare Compare